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Abstract — A  technique  for  classifying  objects  based  on  mod¬ 
eling  the  transient  characteristics  of  their  impulse  response  is 
developed  and  tested.  A  set  of  targets  identical  in  geometry  and 
differing  in  shell  and  filler  material  were  constructed.  The  targets 
were  manually  struck  exciting  an  impulse  response  which  was 
sampled  and  recorded.  The  impulse  response  of  each  target  was 
decomposed  via  windowed  short-time  Fourier  transform  into  a 
set  of  feature  vectors.  The  feature  vectors  were  quantized  via 
the  LBG  VQ  algorithm,  and  the  sets  of  quantized  vectors  were 
used  to  estimate  the  parameters  of  a  discrete-output  hidden 
Markov  model  (HMM)  for  each  class  of  object.  A  blind  test 
set  was  evaluated  against  the  trained  HMMs  and  the  results  are 
presented  along  with  a  discussion  of  the  generalization  ability  of 
the  individual  classifiers. 
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I.  Introduction 

The  acoustic  impulse  response  of  a  geometrically- 
complicated  object  is  composed  of  many  sinusoidal  modes, 
each  with  different  damping  coefficients  and  fundamental 
frequencies.  This  impulse  response  can  be  difficult  to  predict 
if  the  object’s  dimensions  cannot  be  precisely  predicted,  the 
construction  materials  are  not  homogeneous,  or  the  impinging 
location  of  the  excitation  pulse  is  highly  variable.  However, 
repeatable  time-evolving  frequency  features  can  be  observed 
in  the  impulse  response  even  when  the  underlying  generating 
physical  phenomena  are  not  well  understood. 

Acoustic  signal  pattern  classification  techniques  such  as 
those  used  in  speech  recognition  are  a  natural  fit  for  classi¬ 
fying  underwater  objects  using  these  time-evolving  frequency 
features.  Among  these  pattern  classification  methods,  hidden 
Markov  model  (HMM)  classifiers  have  been  used  to  success¬ 
fully  classify  speech  signals  for  many  years  [1],  [2] .  In  many 
speech  applications,  the  speech  signal  is  modeled  as  a  con¬ 
catenation  of  primitive  speech  elements  called  phonemes  and 
the  classification  task  is  to  estimate  the  transition  probabilities 
of  these  underlying  primitives  and  encode  them  as  states  in 
the  HMM. 

The  approach  described  in  this  paper  similarly  treats  the 
time-frequency  decomposition  of  the  acoustic  impulse  re¬ 
sponse  as  a  time-evolving  set  of  emissions  from  an  underlying 
hidden  state  sequence  that  is  unique  to  each  class  of  object. 
This  approach  has  been  successfully  used  in  other  underwater 
acoustic  signal  classification  tasks  in  discriminating  tonal  sig¬ 
nals  from  chirp  and  continuous-wave  pulses  [3].  Additionally, 
rather  than  use  a  standard  time-frequency  decomposition  of 
the  acoustic  return,  others  have  used  different  basis  functions 


Fig.  1.  Graphical  representation  of  a  left-right  hidden  Markov  model.  The 
nodes  labeled  with  variables  Si,  S2,  and  S3  represent  the  underlying  state 
sequence.  The  nodes  labeled  01,  02,  and  03  represent  the  possible  output 
states. 


matched  to  predicted  scattering  wave  physics  as  feature  inputs 
to  HMM  classifiers  [4],  [5], 

The  following  sections  briefly  introduce  the  reader  to  the 
HMM  classifier  and  describe  the  acoustic  impulse  response 
experiment,  classifier  training  and  evaluation,  and  test  results. 

II.  Hidden  Markov  Model  Classifier 

A  discrete-output  hidden  Markov  model  is  defined  by  the 
three  parameters:  A  the  state  transition  matrix,  B  the  state 
emission  probability  matrix,  and  7 r  the  vector  of  initial  state 
probabilities.  As  the  term  Markov  implies,  the  conditional 
probability  of  transition  from  the  current  state  si  to  state  S2, 
P(S  =  S2IS1)  depends  solely  on  the  current  state  si.  These 
discrete  transition  probabilities  populate  the  state  transition 
matrix  A  where  the  entry  a,7  is  the  conditional  probability 
P(S  =  Sjjsj).  If  the  labels  of  each  state  are  known  by  observ¬ 
ing  the  data,  A  can  be  estimated  directly.  However,  usually  the 
labels  of  each  state  are  unknown  or  ’’hidden”  and  the  entries 
of  A  are  estimated  by  iterative  Expectation-Maximization 
techniques  such  as  the  Baum- Welch  algorithm  [1],  The  state 
emissions  probabilities  can  be  modeled  as  either  discrete  or 
continuous  random  variables  conditioned  on  the  underlying 
state.  In  the  discrete  case,  B  is  a  matrix  where  entry  b,:)  is  the 
conditional  probability  of  emitting  observation  o,  given  state 
Sj  or  P(oi\S  =  Sj ). 

Figure  1  illustrates  the  concepts  described  in  the  previous 
paragraph.  The  nodes  labeled  Si,  S2,  and  S3  in  the  top  of 
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the  figure  represent  the  underlying  state  sequence.  The  state 
transition  probabilities  are  denoted  by  an,  012,  «22>  023* 
and  033.  The  state  emission  probabilities  for  the  observations 
Oi,  02,  and  03  are  denoted  bn,  622,  and  633  respectively. 
(Note  that  the  emission  probabilities  are  depicted  only  for  the 
given  observation  sequence  01,  02,  and  03,  the  other  emission 
probabilities  are  not  shown.  Had  the  output  sequence  been 
ordered  02,03,01,  the  emission  probabilities  would  be  labeled 
&12,  b-23,  and  631  respectively.) 

Additionally,  this  particular  figure  illustrates  a  subclass  of 
HMMs  that  are  called  left-right  HMMs.  In  a  left-right  HMM 
a  current  state  can  only  transition  to  another  state  with  an 
index  greater  than  or  equal  to  the  current  index.  The  left- 
right  constraint  is  a  realistic  assumption  when  previous  states 
cannot  be  revisited  due  to  a  phonetic  or  physical  limitation.  For 
example,  when  modeling  the  utterance  of  the  word  ’’cat”,  it  is 
a  reasonable  assumption  that  the  state  that  emits  the  primitive 
associated  with  the  hard  Ikl  sound  cannot  follow  the  state 
that  emits  the  primitive  associated  with  the  /a/  sound.  This 
assumption  reduces  the  number  of  entries  in  the  A  matrix, 
thus  simplifying  the  estimation  process. 

In  a  typical  speech  processing  application,  many  utterances 
of  a  word  or  sequence  of  words  are  digitally  sampled  and 
recorded.  The  time  series  for  each  utterance  is  in  turn  parsed 
into  a  series  of  representative  features  such  as  spectral  or 
linear  predictive  coding  coefficients  to  form  a  sequence  of 
N  observation  vectors  O  =  { O  ]  ,02,...  o  y  }■  To  train  the 
HMM,  maximum  likelihood  estimates  of  the  parameters  A, 
B,  and  7 r  are  found  that  maximize  the  product  of  evaluation 
probabilities,  n"1miA,B  ,  7r),  across  the  ensemble  of  M 
observation  sequences,  where  P(0|A,  B,  7r)  is  the  probability 
an  observation  sequence  O  is  generated  by  the  HMM  with 
parameters  A,  B,  and  7 r  .  Once  the  appropriate  number  of 
HMMs  have  been  trained,  new  observation  sequences  can 
evaluated  for  classification.  When  evaluating  an  unknown 
observation  sequence  O  against  a  set  of  k  trained  HMMs, 
O  is  associated  with  the  HMM  that  evaluates  to  the  largest 
probability  P(0|Afc,  B*,,  7Tfc)  V  k. 

This  section  briefly  described  the  key  HMM  parameters 
and  equations  that  are  directly  related  to  the  experiment 
in  this  paper.  For  an  in-depth  review  of  HMM  parameter 
estimation  via  the  Baum- Welch  algorithm  and  the  algorithmic 
steps  to  evaluating  the  expression  P(0|A,B,7t)  the  reader 
is  directed  to  Rabiner’s  excellent  tutorial  and  Rabiner’s  and 
luang’s  textbook  listed  in  the  bibliography  [1],  [2], 

III.  Acoustic  Impulse  Classification 

For  the  experiment,  a  set  of  9  geometrically  identical  objects 
were  constructed  by  placing  end  caps  on  hollow  cylinders 
made  of  three  different  shell  materials  which  were  filled  to 
capacity  with  three  different  filler  materials.  The  different  ob¬ 
ject  classes  will  hereafter  be  labeled  according  to  a  two  number 
system,  where  Object  ij  means  the  object  with  shell  material 
Type  i  and  filler  material  Type  j .  The  shell  and  filler  materials 
are  numbered  from  1  to  3  in  order  of  increasing  density.  To 
generate  a  data  set  for  training  and  evaluating  each  object’s 


Acoustic  Impulse 
Time  Series 


Fig.  2.  Block  diagram  of  Acoustic  Impulse  Classification.  A  time  series  is 
preprocessed,  decomposed  via  windowed  STFT  and  quantized,  then  evaluated 
against  several  trained  HMMs.  The  class  label  of  the  HMM  that  evaluates  to 
the  highest  probability  is  assigned  to  the  input  observation  sequence. 


HMM,  each  object  is  struck  15  times  near  its  lengthwise 
midpoint  and  the  acoustic  response  is  sampled  and  recorded 
following  each  strike.  Each  time  series  is  then  preprocessed 
and  parsed  into  a  set  of  quantized  feature  vectors  using  time- 
frequency  decomposition  followed  by  vector  quantization.  For 
the  task  of  training  each  HMM,  a  subset  of  8  time  series  are 
used  to  estimate  the  HMM  parameters  for  each  object  class. 
For  the  case  of  object  classification,  an  unknown  test  pattern  is 
associated  with  the  object  whose  HMM  evaluates  to  the  largest 
probability  over  the  set  of  all  possible  HMMs.  Figure  2  depicts 
the  block  diagram  of  the  evaluation  process  described  above. 

A.  Preprocessing 

Each  time  series  is  prepared  for  time-frequency  decompo¬ 
sition  and  vector  quantization  by  three  preprocessing  steps. 
Since  most  of  the  discriminating  target  information  was  found 
to  be  in  lower  frequency  bands,  each  time  series  is  first  filtered 
and  downsampled  to  simplify  computation  and  storage  require¬ 
ments.  Second,  each  down-sampled  time  series  is  normalized 
to  unit  energy.  The  striking  energy  for  each  generated  time 
series  is  highly  variable,  thus  making  total  energy  of  the 
acoustic  impulse  response  an  unreliable  discriminating  feature. 
Finally,  the  ’’dead  zone”  that  exists  before  the  strike  occurs  is 
removed  by  ignoring  samples  until  the  total  integrated  energy 
rises  above  a  certain  threshold. 

B.  Time-Frequency  Feature  Vectors 

As  shown  in  Figure  3,  the  time-frequency  plot  of  an  object’s 
acoustic  impulse  response  is  composed  of  several  clustered 
distinct  events  with  varying  frequency  and  time  characteristics. 
This  plot  gives  insight  into  why  one  would  use  a  time-evolving 
stochastic  model  such  as  an  HMM  to  discriminate  between 
such  signals.  These  time-frequency  events  are  unique  to  each 
class  of  object  and  can  be  modeled  directly  by  the  state- 
sequence  transition  and  state  emission  matrices  of  the  HMM. 

To  put  the  time-frequency  information  into  a  format  that  is 
compatible  with  discrete-output  HMM  training  and  evaluation, 
a  windowed  short-time  Fourier  transform  (STFT)  followed 
by  vector  quantization  is  performed  on  each  time  series. 
Following  preprocessing,  the  time  series  data  is  parsed  into 
40  overlapping  frames  of  100  time  samples.  Each  frame  is 
then  windowed  by  a  Gaussian  window  to  reduce  spectral 
leakage  and  the  DFT  magnitude  of  each  frame  is  recorded. 
Additionally,  since  the  time  series  signal  is  real,  only  the 
first  50  DFT  magnitude  coefficients  are  retained  since  the 
remaining  50  are  redundant.  In  more  compact  notation,  the 
observation  sequence  O  is  encoded  as  a  set  of  40  vectors  of 
DFT  magnitude  O  =  {01,02, . . .  040}  where  40  is  the  number 
of  STFT  frames. 


Fig.  4.  Stacked  bar  graph  of  the  training  object  function  values  for  a  given 
number  of  HMM  hidden  states.  The  size  of  each  element  in  the  stack  indicates 
how  well  the  trained  HMM  discriminates  between  objects. 


Fig.  3.  Time-Frequency  decomposition  of  an  acoustic  impulse  response. 
Note  the  several  distinct  events  identified  by  the  white  arrow. 


In  a  discrete-output  HMM,  the  emission  of  the  underlying 
states  has  a  discrete  probability  distribution,  meaning  each  vec¬ 
tor  in  the  sequence  O  must  be  quantized  into  a  representative 
symbol  that  relates  the  underlying  DFT  magnitude  vector  to 
a  possible  output  of  the  hidden  state.  To  accommodate  this 
requirement  the  LBG  vector  quantization  (VQ)  algorithm  was 
used  to  quantize  the  observation  feature  vector  space  into  32 
discrete  states  for  all  possible  targets  using  the  2880  (9  objects 

x  8  trainm9  sef  e»ces  x  40  ,  .  vectors - )  feature  vectors 

generated  from  the  training  sequences  of  time  series  data  [6], 

C.  HMM  Training 

The  quantized  time  series  training  sequences  of  feature 
vectors  are  separated  according  to  class  and  used  to  train  class- 
specific  HMMs.  Prior  to  parameter  estimation,  the  entries  of 
A  and  B  are  initialized  to  random  values  and  the  tt  vector  is 
assigned  the  value  {1,  0, . . . ,  0}  since  it  is  a  left-right  HMM 
and  therefore  must  begin  at  the  first  state.  What  remains  is 
to  specify  the  number  of  hidden  states  in  the  state  sequence 
and  thus  the  sizes  of  the  A  and  B  matrix  prior  to  maximum 
likelihood  estimation  using  the  Baum-Welch  algorithm.  An 
empirical  method  was  used  to  determine  the  number  hidden 
states  by  training  each  HMM  using  an  increasing  number  of 
hidden  states  until  a  training  objective  function  increased  to  a 
reasonable  value  and  remained  steady.  The  training  objective 
function  for  an  HMM  of  the  j-th  class  over  a  series  of  N 
observations  is  defined  as 


m  = 

N 


-E 

N  ^ 


i=i 


argmin 

k 


\nP(0f\\i)  -  \nP(0f\\k) 


where  A  =  {A,B,7r}.  This  function  estimates  the  mean 
value  of  a  difference  of  log  probabilities  between  all  training 
sequences  associated  with  Object  j  evaluated  on  Object  j’s 
HMM  and  the  HMM  that  gives  the  next  largest  output.  This 
function  is  more  useful  than  strictly  relying  on  the  training 
misclassihcation  error,  especially  when  the  misclassihcation 
error  is  extremely  low  and  there  a  small  number  of  training 
sequences.  The  larger  the  value,  the  better  the  trained  HMM 
is  at  correctly  classifying  the  training  sequences.  A  very  small 
or  negative  value  of  <I>(j)  indicates  the  trained  HMM  has  a 
high  misclassihcation  rate  with  the  training  sequences. 

The  vertically  stacked  bar  graph  in  Figure  4  depicts  the 
training  objective  function  values  for  a  given  number  of  HMM 
hidden  states  in  ascending  class  order  starting  at  object  class 
11  at  the  bottom  of  the  stack  and  ending  with  object  class 
33  at  the  top  of  the  stack.  As  the  number  of  hidden  states 
increases  from  1  to  4  the  overall  training  objective  function 
contributions  trend  upward.  Different  numbers  of  hidden  states 
were  chosen  to  set  HMM  training  parameters  for  each  object 
class  based  on  the  lowest  number  of  states  for  which  the  value 
of  4>(j)  failed  to  increase  appreciably.  Based  on  this  criterion 
Objects  11,  21,  22,  and  23  were  assigned  2  hidden  states,  and 
Objects  12,  13,  31,  32,  and  33  were  assigned  3  hidden  states. 
This  approach  favors  simpler  models  for  each  class,  which  is 
crucial  when  estimating  entries  in  the  state  transition  matrix 
A  in  a  scenario  with  so  few  training  examples.  Although  the 
plot  shows  objective  function  values  for  up  to  4  hidden  states, 
the  H>(j)  values  in  column  4  become  less  reliable  due  to  the 
extra  training  sequences  needed  to  estimate  the  A  matrix.  No 
4-hidden-state  models  were  chosen  for  this  reason. 

While  not  mentioned  above,  the  use  of  1  hidden  state 
is  a  special  case  of  the  HMM  where  predictably,  A  =  1, 
and  the  evaluation  probability  of  an  observation  sequence  is 
solely  dependent  upon  its  emission  probabilities  defined  by 


B.  The  classification  task  with  single  state  HMMs  is  similar 
to  other  single-input  single-output  classifiers  such  as  those 
implemented  by  neural  networks  or  Bayesian  classifiers.  As  is 
shown  in  Figure  4,  classification  performance  on  the  training 
examples  with  a  single  state  HMM  is  actually  quite  good.  This 
indicates  the  different  object  classes’  observation  sequences  do 
not  intersect  over  many  quantized  feature  vectors. 

D.  HMM  Evaluation 

After  each  HMM  is  trained  by  estimating  the  parameters 
A  and  B  from  the  training  data,  the  classifier  is  ready  to 
receive  unknown  test  patterns.  Referring  again  to  Figure  2, 
the  time  series  data  is  preprocessed  by  downsampling  and 
then  decomposed  via  a  windowed  STFT.  The  STFT  frames 
are  quantized  by  assigning  them  the  discrete  symbol  that  is 
closest  in  Euclidean  distance  to  the  VQ  centers  determined 
in  Subsection  1II-B  to  form  the  observation  sequence  O  = 
{vij  v2j  ■  ■  ■  V40},  where  v*  is  the  symbol  associated  with  the  i- 
th  VQ  output  cluster.  This  observation  sequence  is  in  turn  used 
to  evaluate  each  trained  HMM,  and  the  object  class  label  k  that 
evaluates  to  the  largest  probability  in  the  evaluation  equation 
P(0|Afc, Bfc, 74^)  is  assigned  to  the  unknown  pattern. 

IV.  Results 

A  set  of  7  blind  test  examples  were  evaluated  against  the 
trained  HMMs.  The  tabulated  results  of  the  object  classifi¬ 
cations  are  presented  as  fractions  of  correct  classifications  to 
total  test  sequences  in  Table  I.  While  the  results  were  good  for 
the  test  sequences  listed  in  the  table  below,  with  so  few  test 
examples  it  is  difficult  to  draw  meaningful  conclusions  about 
the  classifiers’  generalization  abilities.  As  another  measure  of 
performance,  the  calculation  of  the  objective  function  <T>(j) 
was  repeated  for  the  test  patterns  to  create  the  stacked  bar 
graph  shown  in  Figure  5.  The  graph  is  plotted  using  the 
same  axes  of  Figure  4  to  give  some  intuition  of  how  robust 
each  trained  HMM  is  to  novel  test  patterns.  As  can  be 
expected,  the  margin  of  the  evaluation  probabilities  between 
correct  and  incorrect  classes  has  shrunk  considerably  across 
all  class  labels.  Additionally,  in  the  4-state  column  the  perfor¬ 
mance  behavior  of  some  HMMs  becomes  erratic,  suggesting 
overtraining  or  lack  of  training  data  to  sufficiently  estimate 
the  HMM  parameters.  By  cross-comparing  the  classification 
results  in  one  cell  of  Table  I  to  the  size  of  the  corresponding 
bar  graph  element,  one  can  gain  a  sense  of  how  well  the  correct 
model  is  separated  from  the  other  models  for  a  given  set  of 
test  sequences.  For  example,  Object  11  was  correctly  classified 

TABLE  I 

Fraction  of  Correct  Classification  to  Total  Test  Sequences 


Shell  Type  1 

Shell  Type  2 

Shell  Type  3 

Filler  Type  1 

7 

7 

6 

7 

7 

7 

Filler  Type  2 

6 

7 

7 

7 

7 

7 

Filler  Type  3 

6 

_ 1 _ 

7 

_ 1 _ 

5 

_ 1 _ 

Fig.  5.  Stacked  bar  graph  of  the  test  object  function  values  for  a  given  number 
of  F1MM  hidden  states.  The  size  of  each  element  in  the  stack  indicates  how 
well  the  trained  HMM  discriminates  between  objects  in  the  test  set. 

in  all  test  cases  and  has  a  large  bar  graph  element  in  the  2- 
state  column  of  Figure  5.  This  suggests  the  mean  separation  of 
evaluation  probabilities  was  very  large  and  consistent  between 
HMM  1 1  and  all  other  HMMs  for  the  given  test  set  and  also 
implies  good  performance  against  new  test  patterns.  However 
Object  3 1 ,  which  was  also  correctly  classified  in  all  test  cases, 
has  a  small  bar  element  in  the  3-state  column  of  Figure  5. 
This  means  there  is  little  separation  between  the  correct  and 
incorrect  evaluation  probabilities  and  this  particular  model 
may  not  generalize  well  to  new  patterns. 

V.  Conclusions  and  Future  Work 

The  promising  results  from  this  paper  suggest  that  HMMs 
are  a  useful  pattern  classification  scheme  to  discriminate 
between  different  acoustic  returns  generated  by  striking  or 
other  impulse  excitation  methods.  Additionally,  this  approach 
worked  well  despite  the  restrictions  of  a  small  number  of 
training  sequences  and  the  identical  geometry  of  the  targets. 
The  two  important  areas  this  study  did  not  address  are:  1) 
intraclass  feature  variation  and  2)  signature  aspect  dependency. 
It  is  useful  to  know  how  well  a  given  HMM  can  distinguish 
multiple  objects  of  the  same  class  from  other  object  classes 
and  what  features  have  a  low  within-class  variance.  By  using 
only  one  object  to  characterize  a  given  object  class,  it  is 
difficult  to  assert  that  another  HMM  trained  on  features 
gathered  from  multiple  objects  of  the  same  class  will  have  the 
same  discrimination  ability.  It  is  well  known  that  the  acoustic 
signature  of  an  object  is  aspect-dependent.  Multiaspect  HMM 
acoustic  classification  is  addressed  in  [4]  and  [5]  and  involves 
training  HMMs  with  acoustic  returns  taken  over  a  sequence  of 
different  aspects.  Addressing  these  two  concerns  for  the  case 
of  acoustic  impulse  response  classification  is  a  topic  of  future 
work. 

Finally,  the  time-frequency  decomposition  method  dis¬ 
cussed  in  this  paper  was  not  the  only  one  explored  in  the  work 


of  this  research.  The  wave-based  matching  pursuits  algorithm 
presented  by  McClure  and  Carin  [7]  was  attempted  with 
varying  degrees  of  success  for  this  application.  If  the  impulse 
response  can  be  calculated  from  a  good  understanding  of 
the  generating  physics  and  more  importantly  these  calculated 
effects  can  be  observed  in  the  striking  experiments,  then 
these  customized  basis  decompositions  show  great  promise. 
Future  work  will  continue  to  explore  the  use  of  different 
basis  decompositions  that  more  compactly  define  the  acoustic 
impulse  response. 
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